Pattern-Oriented Hierarchical Clustering*

نویسندگان

  • Tadeusz Morzy
  • Marek Wojciechowski
  • Maciej Zakrzewicz
چکیده

Clustering is a data mining method, which consists in discovering interesting data distributions in very large databases. The applications of clustering cover customer segmentation, catalog design, store layout, stock market segmentation, etc. In this paper, we consider the problem of discovering similarity-based clusters in a large database of event sequences. We introduce a hierarchical algorithm that uses sequential patterns found in the database to efficiently generate both the clustering model and data clusters. The algorithm iteratively merges smaller, similar clusters into bigger ones until the requested number of clusters is reached. In the absence of a well-defined metric space, we propose the similarity measure, which is used in cluster merging. The advantage of the proposed measure is that no additional access to the source database is needed to evaluate the inter-cluster similarities.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Clustering of Distributed Object-Oriented Software Systems: A Generic Solution for Software-Hardware Mismatch Problem

During the software lifecycle, the software structure is subject to many changes in order to fulfill the customer’s requirements. In Distributed Object Oriented systems, software engineers face many challenges to solve the software-hardware mismatch problem in which the software structure does not match the customer’s underlying hardware. A major design problem of Object Oriented software syste...

متن کامل

Analyzing Motorcycle Crash Pattern and Riders’ Fault Status at a National Level: A Case Study from Iran

Motorcycle crashes constitute a significant proportion of traffic accidents all over the world. The aim of this paper was to examine the motorcycle crash patterns and rider fault status across the provinces of Iran. For this purpose, 6638 motorcycle crashes occurred in Iran through 2009-2012 were used as the analysis data and a two-step clustering approach was adopted as the analysis framework....

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Hierarchical Clustering Based Automatic Refactorings Detection

The structure of software systems is subject of many changes during the systems lifecycle. A continuous improvement of the software systems structure can be made using refactoring, that assures a clean and easy to maintain software structure. In this paper we are focusing on the problem of restructuring object oriented software systems using hierarchical clustering. We propose two hierachical c...

متن کامل

Hierarchical Clustering Techniques in Data Mining: a Comparative Study

Clustering is an important data mining technique which has gained a tremendous importance in recent times due to its inherent nature of capturing the hidden structure of the data. In Clustering, different objects that have some similarity based on their characteristics are brought together into a group. Hierarchical Clustering Analysis is one of the clustering techniques which play a significan...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999